week 7: multilevel models
multilevel tadpoles
multilevel models
We’re starting our unit on multilevel models, which can be thought of as models that “remember” features of clusters of data as they learn about all the clusters. The model will pool information across clusters (e.g., our estimates about cluster A will be informed in part by clusters B, C, and D). This tends to improve estimates about each cluster. Here are some other benefits of multilevel modeling:
improved estimates for repeated sampling. If you try to fit a single-level model to these data, you’ll over- or under-fit the data.
improved estimates for imbalance in sampling. prevent over-sampled clusters from dominating inference, while also balancing the fact that larger clusters have more information.
estimates of variation. model variation explicitly!
avoid averaging, retain variation. averaging manufactures false confidence (artificially inflates precision) and introduces arbitrary data transformations.
Multilevel modeling should be your default approach.
review (or introduction?)
(Ignore probability and estimation for now)
\[
Y_i = b_0 + b_1X_{1i} + ... + \epsilon_i
\] This assumes each observation \((i)\) is independent of all other observations. But what if our data were clustered in some way? We might be able to express each observation \(i\) as belonging to a specific cluster, \(j\) .
\[
Y_{ij} = b_0 + b_1X_{1ij} + ... + \epsilon_{ij}
\]
But what is the problem with this?
(Still non-Bayesian)
Intercept-only model
\[\begin{align*}
\text{Level 1}&\\
Y_{ij} &= \beta_{0j} + \epsilon_{ij}\\
\text{Level 2}&\\
\beta_{0j} &= \gamma_{00} + U_{0j} \\
U_{0j} &\sim \text{Normal}(0, \tau^2_{00}) \\
\epsilon_{ij} &\sim \text{Normal}(0, \sigma^2) \\
\end{align*}\]
We can rewrite this as the “combined” model:
\[
Y_{ij} = \gamma_{00} + U_{0j} + \epsilon_{ij}\\
\] Level 1 is where you have data that repeats within your grouping or clustering data. Is your cluster classrooms? Then students are Level 1. Is your cluster people? Then observations are Level 1.
Level 2 takes the parameters at Level 1 and decomposes them into a fixed component \((\gamma)\) that reflects the average and, if desired, the individual deviations around that fixed effect \((U)\) .
\[\begin{align*}
\text{Level 1}&\\
Y_{ij} &= \beta_{0j} + \epsilon_{ij}\\
\text{Level 2}&\\
\beta_{0j} &= \gamma_{00} + U_{0j} \\
U_{0j} &\sim \text{Normal}(0, \tau^2_{00}) \\
\epsilon_{ij} &\sim \text{Normal}(0, \sigma^2) \\
\end{align*}\]
example: multilevel people
data_path = "https://raw.githubusercontent.com/sjweston/uobayes/refs/heads/main/files/data/external_data/mlm.csv"
d <- read.csv (data_path)
rethinking:: precis (d)
mean sd 5.5% 94.5% histogram
id 119.9954338 51.04319970 47.890000 209.020000 ▁▂▂▅▇▅▃▅▂▂▂▁
group NaN NA NA NA
week 1.3981481 1.45595690 0.000000 3.000000 ▇▃▁▂▁▅▁▁▁▁▁▁
wave 1.7899543 0.79082439 1.000000 3.000000 ▇▇▁▂▁▁
con 0.1911393 0.07427626 0.083850 0.309936 ▁▂▃▇▅▃▁▁▁▁
dan 0.1946091 0.06434335 0.096652 0.296208 ▁▁▅▇▇▃▁▁
Code
set.seed (114 )
sample_id = sample (unique (d$ id), replace= F, size= 20 )
d %>%
filter (id %in% sample_id) %>%
ggplot (aes (x = week, y = con, group = id, fill = id)) +
geom_point (aes (color = factor (id))) +
stat_smooth (aes (color = factor (id)),
method = "lm" , se = FALSE ) +
theme (legend.position = "none" )
What if we wanted to estimate each person’s conscientiousness score? One method would be to simply average scores for each person, but we lose a lot of information that way. Another option would be to treat each person as a group and model scores as a function of group. We can do this using an unpooled model.
\[\begin{align*}
\text{con}_i &\sim \text{Normal}(\mu_i,\sigma) \\
\mu_i &= \alpha_{\text{id}[i]} \\
\alpha_j &\sim \text{Normal}(0, 1.5) \text{ for }j=1,...,91 \\
\sigma &\sim \text{Exponential}(1)
\end{align*}\]
d$ id.f = as.factor (d$ id)
m1 <- brm (
data= d,
family= gaussian,
bf (con ~ 0 + a,
a ~ 0 + id.f,
nl = TRUE ),
prior = c ( prior (normal (0 , 1.5 ), class= b, nlpar= a),
prior (exponential (1 ), class= sigma)),
iter= 2000 , warmup= 1000 , chains= 4 , cores= 4 , seed= 9 ,
file= here ("files/models/m71.1" )
)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: con ~ 0 + a
a ~ 0 + id.f
Data: d (Number of observations: 219)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a_id.f6 0.19 0.03 0.14 0.25 1.00 5625 3051
a_id.f29 0.12 0.03 0.06 0.19 1.00 6068 2824
a_id.f34 0.11 0.03 0.04 0.17 1.00 5718 3035
a_id.f36 0.16 0.03 0.09 0.23 1.00 6119 2895
a_id.f37 0.19 0.03 0.13 0.24 1.00 5445 3015
a_id.f48 0.22 0.03 0.16 0.27 1.00 5453 3081
a_id.f53 0.19 0.03 0.12 0.26 1.00 5551 2922
a_id.f54 0.25 0.03 0.19 0.30 1.00 5960 3024
a_id.f58 0.23 0.03 0.18 0.29 1.00 6524 2736
a_id.f61 0.09 0.03 0.02 0.16 1.00 5749 2323
a_id.f66 0.24 0.03 0.18 0.29 1.00 6296 2809
a_id.f67 0.20 0.02 0.15 0.25 1.00 6392 2847
a_id.f69 0.23 0.05 0.14 0.33 1.00 6454 3110
a_id.f71 0.06 0.03 0.01 0.11 1.00 5961 2871
a_id.f74 0.19 0.05 0.09 0.28 1.00 5496 2875
a_id.f75 0.27 0.03 0.20 0.34 1.00 6453 2771
a_id.f76 0.10 0.03 0.05 0.15 1.00 5279 2847
a_id.f78 0.24 0.03 0.19 0.30 1.00 6529 2775
a_id.f79 0.13 0.03 0.07 0.20 1.00 5353 3025
a_id.f80 0.14 0.03 0.07 0.20 1.00 4992 3154
a_id.f81 0.24 0.03 0.18 0.30 1.00 5350 2923
a_id.f82 0.33 0.02 0.28 0.37 1.00 5953 3181
a_id.f85 0.16 0.03 0.09 0.22 1.00 5285 2947
a_id.f86 0.12 0.03 0.05 0.19 1.00 5730 2870
a_id.f87 0.11 0.03 0.04 0.18 1.00 6111 3085
a_id.f89 0.14 0.03 0.08 0.19 1.00 6114 2961
a_id.f91 0.17 0.03 0.12 0.23 1.00 6302 3053
a_id.f92 0.20 0.03 0.15 0.26 1.00 5362 2901
a_id.f93 0.25 0.03 0.20 0.31 1.00 5458 3163
a_id.f94 0.12 0.03 0.06 0.18 1.00 5703 2502
a_id.f96 0.23 0.03 0.17 0.30 1.00 5478 3106
a_id.f97 0.25 0.02 0.20 0.29 1.00 5962 3126
a_id.f98 0.14 0.02 0.09 0.19 1.00 5907 2922
a_id.f99 0.18 0.02 0.13 0.23 1.00 5436 3042
a_id.f101 0.24 0.03 0.19 0.30 1.00 5774 2770
a_id.f102 0.23 0.05 0.13 0.33 1.00 5546 2841
a_id.f103 0.15 0.03 0.08 0.22 1.00 6545 2763
a_id.f104 0.17 0.03 0.11 0.24 1.00 5815 2877
a_id.f105 0.16 0.03 0.11 0.22 1.00 6403 2823
a_id.f106 0.20 0.03 0.14 0.25 1.00 5788 2810
a_id.f110 0.18 0.03 0.13 0.24 1.00 5848 2835
a_id.f112 0.09 0.03 0.04 0.15 1.00 6419 2807
a_id.f114 0.13 0.03 0.08 0.19 1.00 6196 3212
a_id.f115 0.14 0.03 0.09 0.20 1.00 5733 2479
a_id.f116 0.14 0.03 0.09 0.20 1.00 5518 2592
a_id.f120 0.35 0.03 0.29 0.42 1.00 6731 2725
a_id.f122 0.17 0.03 0.11 0.23 1.00 6214 2940
a_id.f125 0.20 0.03 0.14 0.25 1.00 5916 3061
a_id.f127 0.23 0.03 0.17 0.28 1.00 5723 3174
a_id.f129 0.13 0.03 0.06 0.20 1.00 5144 2627
a_id.f135 0.30 0.03 0.24 0.35 1.00 4810 2702
a_id.f136 0.29 0.03 0.24 0.35 1.00 5689 3004
a_id.f137 0.29 0.03 0.23 0.36 1.00 5267 3053
a_id.f140 0.13 0.03 0.06 0.19 1.00 5082 2662
a_id.f141 0.19 0.03 0.12 0.26 1.00 6459 2535
a_id.f142 0.18 0.03 0.11 0.24 1.00 5489 3121
a_id.f143 0.31 0.03 0.24 0.38 1.00 4659 2904
a_id.f144 0.17 0.03 0.11 0.24 1.00 5522 2935
a_id.f146 0.21 0.03 0.14 0.27 1.00 5840 3031
a_id.f149 0.23 0.03 0.16 0.29 1.00 5598 2872
a_id.f150 0.20 0.04 0.13 0.27 1.00 5041 3086
a_id.f152 0.22 0.03 0.16 0.29 1.00 5956 2740
a_id.f153 0.15 0.03 0.08 0.22 1.00 5164 2357
a_id.f155 0.14 0.03 0.08 0.20 1.00 6107 3095
a_id.f156 0.14 0.03 0.08 0.19 1.00 4763 3080
a_id.f159 0.12 0.03 0.06 0.19 1.00 6144 2853
a_id.f160 0.18 0.03 0.11 0.24 1.00 5525 3077
a_id.f162 0.39 0.03 0.34 0.45 1.00 5761 2859
a_id.f163 0.13 0.03 0.07 0.20 1.00 5954 2910
a_id.f165 0.20 0.03 0.13 0.26 1.00 5551 2634
a_id.f167 0.13 0.03 0.06 0.20 1.00 6029 2934
a_id.f169 0.24 0.03 0.18 0.31 1.00 5742 2835
a_id.f171 0.23 0.04 0.16 0.30 1.00 6125 2801
a_id.f174 0.26 0.03 0.20 0.33 1.00 5144 2945
a_id.f182 0.19 0.03 0.12 0.26 1.00 5523 2574
a_id.f187 0.06 0.03 -0.00 0.13 1.00 5601 3169
a_id.f189 0.22 0.03 0.16 0.29 1.00 6924 2991
a_id.f190 0.13 0.03 0.06 0.19 1.00 5857 3073
a_id.f193 0.21 0.03 0.14 0.28 1.00 5658 2451
a_id.f194 0.12 0.03 0.05 0.19 1.00 5872 3228
a_id.f201 0.19 0.03 0.12 0.25 1.00 6312 2938
a_id.f204 0.29 0.03 0.22 0.35 1.00 6373 3108
a_id.f205 0.19 0.03 0.13 0.26 1.00 6528 3011
a_id.f208 0.20 0.03 0.13 0.27 1.00 6211 2455
a_id.f209 0.21 0.03 0.15 0.28 1.00 6647 2275
a_id.f211 0.14 0.03 0.07 0.20 1.00 5620 2983
a_id.f214 0.28 0.03 0.21 0.35 1.00 5602 2885
a_id.f219 0.23 0.03 0.16 0.30 1.00 6497 2666
a_id.f222 0.14 0.03 0.07 0.20 1.00 6280 3085
a_id.f223 0.17 0.03 0.10 0.23 1.00 5317 3186
a_id.f229 0.14 0.03 0.07 0.21 1.00 5391 2633
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.05 0.00 0.04 0.05 1.00 1869 2510
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
This is inefficient, in that the model treat each person as entirely separate. Let’s try a partial pooling model.
\[\begin{align*}
\text{con}_i &\sim \text{Normal}(\mu_i, \sigma) \\
\mu_i &= \alpha_{\text{id}[i]} \\
\alpha_j &\sim \text{Normal}(\bar{\alpha}, \sigma_{\alpha}) \text{ for j in 1...91}\\
\bar{\alpha} &\sim \text{Normal}(0, 1.5)\\
\sigma_{\alpha} &\sim \text{Exponential}(1) \\
\sigma \sim &\text{Exponential}(1) \\
\end{align*}\]
m2 <- brm (
data= d,
family= gaussian,
con ~ 1 + (1 | id),
prior = c ( prior (normal (0 , 1.5 ), class= Intercept),
prior (exponential (1 ), class= sd),
prior (exponential (1 ), class= sigma)),
iter= 2000 , warmup= 1000 , chains= 4 , cores= 4 , seed= 9 ,
file= here ("files/models/m71.2" )
)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: con ~ 1 + (1 | id)
Data: d (Number of observations: 219)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~id (Number of levels: 91)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.06 0.01 0.05 0.07 1.00 1515 2201
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.19 0.01 0.18 0.20 1.00 2210 2482
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.05 0.00 0.04 0.05 1.00 2942 2835
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
how many parameters does each model have?
[1] "b_a_id.f6" "b_a_id.f29" "b_a_id.f34" "b_a_id.f36"
[5] "b_a_id.f37" "b_a_id.f48" "b_a_id.f53" "b_a_id.f54"
[9] "b_a_id.f58" "b_a_id.f61" "b_a_id.f66" "b_a_id.f67"
[13] "b_a_id.f69" "b_a_id.f71" "b_a_id.f74" "b_a_id.f75"
[17] "b_a_id.f76" "b_a_id.f78" "b_a_id.f79" "b_a_id.f80"
[21] "b_a_id.f81" "b_a_id.f82" "b_a_id.f85" "b_a_id.f86"
[25] "b_a_id.f87" "b_a_id.f89" "b_a_id.f91" "b_a_id.f92"
[29] "b_a_id.f93" "b_a_id.f94" "b_a_id.f96" "b_a_id.f97"
[33] "b_a_id.f98" "b_a_id.f99" "b_a_id.f101" "b_a_id.f102"
[37] "b_a_id.f103" "b_a_id.f104" "b_a_id.f105" "b_a_id.f106"
[41] "b_a_id.f110" "b_a_id.f112" "b_a_id.f114" "b_a_id.f115"
[45] "b_a_id.f116" "b_a_id.f120" "b_a_id.f122" "b_a_id.f125"
[49] "b_a_id.f127" "b_a_id.f129" "b_a_id.f135" "b_a_id.f136"
[53] "b_a_id.f137" "b_a_id.f140" "b_a_id.f141" "b_a_id.f142"
[57] "b_a_id.f143" "b_a_id.f144" "b_a_id.f146" "b_a_id.f149"
[61] "b_a_id.f150" "b_a_id.f152" "b_a_id.f153" "b_a_id.f155"
[65] "b_a_id.f156" "b_a_id.f159" "b_a_id.f160" "b_a_id.f162"
[69] "b_a_id.f163" "b_a_id.f165" "b_a_id.f167" "b_a_id.f169"
[73] "b_a_id.f171" "b_a_id.f174" "b_a_id.f182" "b_a_id.f187"
[77] "b_a_id.f189" "b_a_id.f190" "b_a_id.f193" "b_a_id.f194"
[81] "b_a_id.f201" "b_a_id.f204" "b_a_id.f205" "b_a_id.f208"
[85] "b_a_id.f209" "b_a_id.f211" "b_a_id.f214" "b_a_id.f219"
[89] "b_a_id.f222" "b_a_id.f223" "b_a_id.f229" "sigma"
[93] "lprior" "lp__" "accept_stat__" "stepsize__"
[97] "treedepth__" "n_leapfrog__" "divergent__" "energy__"
how many parameters does each model have?
get_variables (m1) %>% length ()
get_variables (m2) %>% length ()
What additional parameters?
m1 has a unique intercept for each participant and a standard deviation of scores (1 \(\sigma\) ).
m2 is estimating all of that plus a grand mean intercept and the variability of means (\(\sigma_M\) ).
(what’s the extra one? brms lists the intercept twice. *shrug emoji*)
And yet!
m1 <- add_criterion (m1, criterion = "loo" )
m2 <- add_criterion (m2, criterion = "loo" )
loo_compare (m1, m2) %>% print (simplify= F)
elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic
m2 0.0 0.0 314.5 11.0 63.9 5.1 -628.9 22.0
m1 -17.1 5.9 297.4 9.2 83.6 5.2 -594.7 18.5
Let’s visualize the differences in these.
Code
nd1 = distinct (d, id.f)
post1 = epred_draws (m1, nd1)
nd2 = distinct (d, id)
post2 = epred_draws (m2, nd2)
p1 = post1 %>%
ggplot ( aes (y= .epred, x= id.f) ) +
stat_gradientinterval () +
scale_x_discrete (labels= NULL , breaks= NULL ) +
labs (x= "id" , y= "con" , title = "no pooling" )
p2 = post2 %>%
mutate (id= as.factor (id)) %>%
ggplot ( aes (y= .epred, x= id) ) +
stat_gradientinterval () +
scale_x_discrete (labels= NULL , breaks= NULL ) +
labs (x= "id" , y= "con" , title = "partial pooling" )
p1 / p2
Code
means1 = post1 %>%
mean_qi (.epred)
means2 = post2 %>%
mean_qi (.epred) %>%
mutate (id= as.factor (id))
means1 %>%
ggplot ( aes (x= id.f, y= .epred)) +
geom_hline ( aes (yintercept= mean (.epred)),
linetype= "dashed" ) +
geom_point ( aes (color= "no pooling" ) ) +
geom_point ( aes (x= id, color= "partial pooling" ),
data= means2,
size= 2 ) +
scale_color_manual ( values= c ("#e07a5f" , "#1c5253" ) ) +
scale_x_discrete (breaks= NULL ) +
labs (x= "id" , y= "con" )+
theme (legend.position = "top" )
\[\begin{align*}
\text{con}_{ij} &\sim \text{Normal}(\mu_i, \sigma) \\
\mu_i &= \alpha_{id[i]} \\
\alpha_{id[i]} &\sim \text{Normal}(\bar{\alpha}, 1) \text{ for i in } 1...91\\
\bar{\alpha} &\sim \text{Normal}(.50,.25) \\
\sigma &\sim \text{Exponential}(1) \\
\end{align*}\]
Another way to write out the expected value of \(\mu_j\) is:
\[
\mu_j = \bar{\alpha} + \alpha_{i[ID]}
\]
Where \(\bar{\alpha}\) is the mean of the means and \(\alpha_{i[ID]}\) is the deviation of each person from the grand mean. There will be one \(\bar{\alpha}\) for the entire sample, but a different \(\alpha_{i[ID]}\) for each person. Because there are multiple values for each person, we can calculate the variability of those \(\alpha\) ’s, as well as the residual variability within each person.
\[\begin{align*}
\text{con}_{ij} &\sim \text{Normal}(\mu_i, \sigma) \\
\mu_i &= \bar{\alpha} + \alpha_{id[i]} \\
\bar{\alpha} &\sim \text{Normal}(.50,.25) \\
\sigma &\sim \text{Exponential}(1) \\
\sigma_{\alpha} &\sim \text{Exponential}(1) \\
\end{align*}\]
In this reparameterization, we no longer need a prior for each of the person-means. That’s because they must have a mean of 0. Instead, we only need to estimate the variability \((\sigma_{\alpha})\) of these deviations.
In other words, this model is analogous to an ANOVA, in which we calculate both within- and between-group (aka person) variability. Therefore, we’ll have a standard deviation for both levels.
magical tidybayes
spread_draws (m2, b_Intercept, r_id[id, term])
# A tibble: 364,000 × 7
# Groups: id, term [91]
.chain .iteration .draw b_Intercept id term r_id
<int> <int> <int> <dbl> <int> <chr> <dbl>
1 1 1 1 0.184 6 Intercept 0.0319
2 1 1 1 0.184 29 Intercept -0.0178
3 1 1 1 0.184 34 Intercept -0.0763
4 1 1 1 0.184 36 Intercept -0.0297
5 1 1 1 0.184 37 Intercept 0.00844
6 1 1 1 0.184 48 Intercept 0.0446
7 1 1 1 0.184 53 Intercept 0.0224
8 1 1 1 0.184 54 Intercept 0.0957
9 1 1 1 0.184 58 Intercept 0.00863
10 1 1 1 0.184 61 Intercept -0.00858
# ℹ 363,990 more rows
Code
spread_draws (m2, b_Intercept, r_id[id, term]) %>%
mutate (p_mean = b_Intercept + r_id) %>%
filter (id %in% sample_id) %>%
ggplot (aes (x = id, y = p_mean)) +
geom_hline ( aes (yintercept = mean (d$ con)),
linetype= "dashed" ) +
stat_halfeye (alpha = .5 )
In this figure, we can visualize three approaches to pooling:
NO POOLING : each person’s mean is calculated from only their own data, there is no sharing of information. (black dots)
COMPLETE POOLING : any differences between people are just random noise. The mean of the whole group is the only estimate and is assumed to apply equally to each person (dashed line).
PARTIAL POOLING : information is shared across participants, but they are assumed to be different. We regularize estimates of person-specific means towards the grand mean. The more information we have on a person, the less their estimate is regularized.
Code
actual_means = d %>%
with_groups (id, summarise, p_mean = mean (con)) %>%
mutate ()
spread_draws (m2, b_Intercept, r_id[id, term]) %>%
mutate (p_mean = b_Intercept + r_id) %>%
mean_qi (p_mean) %>%
filter (id %in% sample_id) %>%
ggplot ( aes (x= id, y= p_mean) ) +
geom_hline ( aes (yintercept = mean (d$ con)),
linetype= "dashed" ) +
geom_point ( size= 2 , color= "#e07a5f" ) +
geom_point ( data= filter (actual_means,
id %in% sample_id)) +
labs (x= "id" ,y= "con" )
writing our mulitilevel model
It’s common to write out mulitlevel models using formulas for the different levels. Level 1 is the level of your outcome, or the thing that repeats. Level 2 is the level of your groups.
\[\begin{align*}
\text{Level 1} &\\
\text{con}_{ij} &= \alpha_i + \epsilon_{ij} \\
\text{Level 2} &\\
\alpha_i &= \gamma_0 + U_i
\end{align*}\]
Some refer to \(U_j\) as a “random” or “varying” effect because it varies across groups. \(\gamma_0\) is therefore a “fixed” or “non-varying” effect because it applies to all groups.
drawing from the posterior
We’ve already seen how we can use spread_draws to get our parameter estimates.
m2 %>% spread_draws (b_Intercept, sigma, sd_id__Intercept) %>% head
# A tibble: 6 × 6
.chain .iteration .draw b_Intercept sigma sd_id__Intercept
<int> <int> <int> <dbl> <dbl> <dbl>
1 1 1 1 0.184 0.0489 0.0641
2 1 2 2 0.180 0.0454 0.0633
3 1 3 3 0.180 0.0429 0.0708
4 1 4 4 0.177 0.0474 0.0666
5 1 5 5 0.184 0.0471 0.0617
6 1 6 6 0.191 0.0471 0.0548
m2 %>% spread_draws (r_id[id, term]) %>% head
# A tibble: 6 × 6
# Groups: id, term [1]
id term r_id .chain .iteration .draw
<int> <chr> <dbl> <int> <int> <int>
1 6 Intercept 0.0319 1 1 1
2 6 Intercept -0.00463 1 2 2
3 6 Intercept -0.0217 1 3 3
4 6 Intercept -0.0230 1 4 4
5 6 Intercept 0.0167 1 5 5
6 6 Intercept -0.0143 1 6 6
We can get expected values (means) for individuals in our sample.
set.seed (9 )
sample_id = sample (unique (d$ id), replace= F, size= 3 )
nd = data.frame (id= sample_id, id.f = sample_id)
add_epred_draws (newdata= nd, object= m2) %>%
mean_qi ()
# A tibble: 3 × 9
id id.f .row .epred .lower .upper .width .point .interval
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 48 48 3 0.213 0.162 0.264 0.95 mean qi
2 137 137 2 0.266 0.208 0.323 0.95 mean qi
3 146 146 1 0.205 0.148 0.262 0.95 mean qi
We can get predicted values (scores) for individuals in our sample.
add_predicted_draws (newdata= nd, object= m2) %>%
mean_qi ()
# A tibble: 3 × 9
id id.f .row .prediction .lower .upper .width .point .interval
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 48 48 3 0.212 0.108 0.316 0.95 mean qi
2 137 137 2 0.266 0.156 0.375 0.95 mean qi
3 146 146 1 0.203 0.0935 0.311 0.95 mean qi
expected values vs predicted values
Code
expected = add_epred_draws (newdata= nd, object= m2)
predicted = add_predicted_draws (newdata= nd, object= m2)
full_join (expected, predicted) %>%
filter (.draw <= 100 ) %>%
ungroup () %>%
ggplot ( aes (x= id, y= .epred)) +
stat_halfeye ( aes (fill= "expected" ),
alpha= .3 ) +
stat_halfeye ( aes (y = .prediction, fill= "predicted" ),
alpha= .3 ) +
scale_fill_manual ( values= c ("#e07a5f" ,"#1c5253" ) ) +
labs (x= "id" , y= "con" , fill= "draw" ) +
scale_x_continuous (breaks= sample_id) +
theme (legend.position = "top" )
Finally, we can get expected values for new individuals
Code
new_id = max (d$ id) + seq (1 : 3 )
nd = data.frame (id= new_id)
add_epred_draws (newdata= nd, object= m2,
allow_new_levels= T) %>%
ggplot ( aes (x= id, y= .epred) ) +
stat_halfeye ()
adding covariates
Let’s add time to our model. Because time varies (can change) within person from assessment to assessment, this is a Level 1 variable. Note that I have NOT added a varying component to Level 2 – in other words, I’m stating that the effect of time on conscientiousness is fixed or identical across participants.
\[\begin{align*}
\text{Level 1} &\\
\text{con}_{ij} &= \alpha_i + \beta_i(\text{week}_{ij}) + \epsilon_{ij} \\
\text{Level 2} &\\
\alpha_i &= \gamma_0 + U_{0i} \\
\beta_i &= \gamma_1 \\
\end{align*}\]
m3 <- brm (
data= d,
family= gaussian,
con ~ 1 + week + (1 | id),
prior = c ( prior (normal (.50 , .25 ), class= Intercept),
prior (normal (0 , 1 ), class= b), #new prior for new term
prior (exponential (1 ), class= sd),
prior (exponential (1 ), class= sigma)),
iter= 2000 , warmup= 1000 , chains= 4 , cores= 4 , seed= 9 ,
file= here ("files/models/m71.3" )
)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: con ~ 1 + week + (1 | id)
Data: d (Number of observations: 216)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~id (Number of levels: 88)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.06 0.01 0.05 0.07 1.01 1237 2024
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.19 0.01 0.18 0.21 1.00 2132 2576
week -0.00 0.00 -0.01 0.00 1.00 5783 3151
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.05 0.00 0.04 0.05 1.00 2931 2859
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
There are different intercepts for each participant, but not different slopes.
[1] "b_Intercept" "b_week" "sd_id__Intercept"
[4] "sigma" "Intercept" "r_id[6,Intercept]"
[7] "r_id[29,Intercept]" "r_id[34,Intercept]" "r_id[36,Intercept]"
[10] "r_id[37,Intercept]" "r_id[48,Intercept]" "r_id[53,Intercept]"
[13] "r_id[54,Intercept]" "r_id[58,Intercept]" "r_id[61,Intercept]"
[16] "r_id[66,Intercept]" "r_id[67,Intercept]" "r_id[71,Intercept]"
[19] "r_id[75,Intercept]" "r_id[76,Intercept]" "r_id[78,Intercept]"
[22] "r_id[79,Intercept]" "r_id[80,Intercept]" "r_id[81,Intercept]"
[25] "r_id[82,Intercept]" "r_id[85,Intercept]" "r_id[86,Intercept]"
[28] "r_id[87,Intercept]" "r_id[89,Intercept]" "r_id[91,Intercept]"
[31] "r_id[92,Intercept]" "r_id[93,Intercept]" "r_id[94,Intercept]"
[34] "r_id[96,Intercept]" "r_id[97,Intercept]" "r_id[98,Intercept]"
[37] "r_id[99,Intercept]" "r_id[101,Intercept]" "r_id[103,Intercept]"
[40] "r_id[104,Intercept]" "r_id[105,Intercept]" "r_id[106,Intercept]"
[43] "r_id[110,Intercept]" "r_id[112,Intercept]" "r_id[114,Intercept]"
[46] "r_id[115,Intercept]" "r_id[116,Intercept]" "r_id[120,Intercept]"
[49] "r_id[122,Intercept]" "r_id[125,Intercept]" "r_id[127,Intercept]"
[52] "r_id[129,Intercept]" "r_id[135,Intercept]" "r_id[136,Intercept]"
[55] "r_id[137,Intercept]" "r_id[140,Intercept]" "r_id[141,Intercept]"
[58] "r_id[142,Intercept]" "r_id[143,Intercept]" "r_id[144,Intercept]"
[61] "r_id[146,Intercept]" "r_id[149,Intercept]" "r_id[150,Intercept]"
[64] "r_id[152,Intercept]" "r_id[153,Intercept]" "r_id[155,Intercept]"
[67] "r_id[156,Intercept]" "r_id[159,Intercept]" "r_id[160,Intercept]"
[70] "r_id[162,Intercept]" "r_id[163,Intercept]" "r_id[165,Intercept]"
[73] "r_id[167,Intercept]" "r_id[169,Intercept]" "r_id[171,Intercept]"
[76] "r_id[174,Intercept]" "r_id[182,Intercept]" "r_id[187,Intercept]"
[79] "r_id[189,Intercept]" "r_id[190,Intercept]" "r_id[193,Intercept]"
[82] "r_id[194,Intercept]" "r_id[201,Intercept]" "r_id[204,Intercept]"
[85] "r_id[205,Intercept]" "r_id[208,Intercept]" "r_id[209,Intercept]"
[88] "r_id[211,Intercept]" "r_id[214,Intercept]" "r_id[219,Intercept]"
[91] "r_id[222,Intercept]" "r_id[223,Intercept]" "r_id[229,Intercept]"
[94] "lprior" "lp__" "accept_stat__"
[97] "stepsize__" "treedepth__" "n_leapfrog__"
[100] "divergent__" "energy__"
Code
set.seed (9 )
sample_id = sample (unique (d$ id), replace= F, size = 20 )
distinct (d, id, week) %>%
filter (id %in% sample_id) %>%
add_epred_draws (m3) %>%
mean_qi () %>%
ggplot ( aes (x= week, y= .epred, color= as.factor (id))) +
geom_point () +
geom_line () +
guides (color= F) +
labs (x= "week" ,y= "con" )
level 2 covariates
We can also add covariates at Level 2, or the person-level in this case. We have a binary variable called group that refers to each person.
\[\begin{align*}
\text{Level 1} &\\
\text{con}_{ij} &= \alpha_i + \beta_i(\text{week}_{ij}) + \epsilon_{ij} \\
\text{Level 2} &\\
\alpha_i &= \gamma_{0\text{group}[i]} + U_{0i} \\
\beta_i &= \gamma_{1} \\
\end{align*}\]
m4 <- brm (
data= d,
family= gaussian,
con ~ 0 + week + group + (1 | id),
prior = c ( prior (normal (0 , 1 ), class= b), # this prior still works!
prior (exponential (1 ), class= sd),
prior (exponential (1 ), class= sigma)),
iter= 5000 , warmup= 1000 , chains= 4 , cores= 4 , seed= 9 ,
file= here ("files/models/m71.4" )
)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: con ~ 0 + week + group + (1 | id)
Data: d (Number of observations: 216)
Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
total post-warmup draws = 16000
Multilevel Hyperparameters:
~id (Number of levels: 88)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.06 0.01 0.05 0.07 1.00 5193 9057
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
week -0.00 0.00 -0.01 0.00 1.00 26657 13146
groupCTRL 0.20 0.01 0.18 0.23 1.00 7225 9612
groupPD 0.19 0.01 0.17 0.20 1.00 6794 9456
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.05 0.00 0.04 0.05 1.00 11021 11211
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
[1] "b_week" "b_groupCTRL" "b_groupPD"
[4] "sd_id__Intercept" "sigma" "r_id[6,Intercept]"
[7] "r_id[29,Intercept]" "r_id[34,Intercept]" "r_id[36,Intercept]"
[10] "r_id[37,Intercept]" "r_id[48,Intercept]" "r_id[53,Intercept]"
[13] "r_id[54,Intercept]" "r_id[58,Intercept]" "r_id[61,Intercept]"
[16] "r_id[66,Intercept]" "r_id[67,Intercept]" "r_id[71,Intercept]"
[19] "r_id[75,Intercept]" "r_id[76,Intercept]" "r_id[78,Intercept]"
[22] "r_id[79,Intercept]" "r_id[80,Intercept]" "r_id[81,Intercept]"
[25] "r_id[82,Intercept]" "r_id[85,Intercept]" "r_id[86,Intercept]"
[28] "r_id[87,Intercept]" "r_id[89,Intercept]" "r_id[91,Intercept]"
[31] "r_id[92,Intercept]" "r_id[93,Intercept]" "r_id[94,Intercept]"
[34] "r_id[96,Intercept]" "r_id[97,Intercept]" "r_id[98,Intercept]"
[37] "r_id[99,Intercept]" "r_id[101,Intercept]" "r_id[103,Intercept]"
[40] "r_id[104,Intercept]" "r_id[105,Intercept]" "r_id[106,Intercept]"
[43] "r_id[110,Intercept]" "r_id[112,Intercept]" "r_id[114,Intercept]"
[46] "r_id[115,Intercept]" "r_id[116,Intercept]" "r_id[120,Intercept]"
[49] "r_id[122,Intercept]" "r_id[125,Intercept]" "r_id[127,Intercept]"
[52] "r_id[129,Intercept]" "r_id[135,Intercept]" "r_id[136,Intercept]"
[55] "r_id[137,Intercept]" "r_id[140,Intercept]" "r_id[141,Intercept]"
[58] "r_id[142,Intercept]" "r_id[143,Intercept]" "r_id[144,Intercept]"
[61] "r_id[146,Intercept]" "r_id[149,Intercept]" "r_id[150,Intercept]"
[64] "r_id[152,Intercept]" "r_id[153,Intercept]" "r_id[155,Intercept]"
[67] "r_id[156,Intercept]" "r_id[159,Intercept]" "r_id[160,Intercept]"
[70] "r_id[162,Intercept]" "r_id[163,Intercept]" "r_id[165,Intercept]"
[73] "r_id[167,Intercept]" "r_id[169,Intercept]" "r_id[171,Intercept]"
[76] "r_id[174,Intercept]" "r_id[182,Intercept]" "r_id[187,Intercept]"
[79] "r_id[189,Intercept]" "r_id[190,Intercept]" "r_id[193,Intercept]"
[82] "r_id[194,Intercept]" "r_id[201,Intercept]" "r_id[204,Intercept]"
[85] "r_id[205,Intercept]" "r_id[208,Intercept]" "r_id[209,Intercept]"
[88] "r_id[211,Intercept]" "r_id[214,Intercept]" "r_id[219,Intercept]"
[91] "r_id[222,Intercept]" "r_id[223,Intercept]" "r_id[229,Intercept]"
[94] "lprior" "lp__" "accept_stat__"
[97] "stepsize__" "treedepth__" "n_leapfrog__"
[100] "divergent__" "energy__"
Code
set.seed (9 )
sample_id = sample (unique (d$ id), replace= F, size = 20 )
distinct (d, id, group, week) %>%
filter (id %in% sample_id) %>%
add_epred_draws (m4) %>%
mean_qi () %>%
ggplot ( aes (x= week, y= .epred, group = as.factor (id), color= group)) +
geom_point () +
geom_line () +
scale_color_manual (values= c ("#1c5253" , "#e07a5f" )) +
labs (x= "week" ,y= "con" ) +
theme (legend.position = "top" )
So far, we’ve only added covariates to the estimates of the intercept. But we can have covariates on our slope parameters. These are interactions (the effect of variable X on Y changes as a function of variable Z.) And they introduce additional parameters into our model.
\[\begin{align*}
\text{Level 1} &\\
\text{con}_{ij} &= \alpha_i + \beta_i(\text{week}_{ij}) + \epsilon_{ij} \\
\text{Level 2} &\\
\alpha_i &= \gamma_{0} + U_{0i} \\
\beta_i &= \gamma_{1} + U_{1i} \\
\end{align*}\]
m5 <- brm (
data= d,
family= gaussian,
con ~ 1 + week + (1 + week | id),
prior = c ( prior (normal (0 , 1 ), class= Intercept),
prior (normal (0 , 1 ), class= b),
prior (exponential (1 ), class= sd),
prior (exponential (1 ), class= sigma)),
iter= 5000 , warmup= 1000 , chains= 4 , cores= 4 , seed= 9 ,
file= here ("files/models/m71.5" )
)
Note the hyperparameter, cor(Intercept, week)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: con ~ 1 + week + (1 + week | id)
Data: d (Number of observations: 216)
Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
total post-warmup draws = 16000
Multilevel Hyperparameters:
~id (Number of levels: 88)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.06 0.01 0.05 0.07 1.00 3048 3184
sd(week) 0.00 0.00 0.00 0.01 1.00 1416 973
cor(Intercept,week) -0.07 0.48 -0.89 0.90 1.00 8793 8255
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.19 0.01 0.18 0.21 1.00 3511 5895
week -0.00 0.00 -0.01 0.00 1.00 10237 7760
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.05 0.00 0.04 0.05 1.00 1856 1084
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).